Toward the Evaluation of Machine Translation Using Patent Information
نویسندگان
چکیده
To aid research and development in machine translation, we have produced a test collection for Japanese/English machine translation. To obtain a parallel corpus, we extracted patent documents for the same or related inventions published in Japan and the United States. Our test collection includes approximately 2 000 000 sentence pairs in Japanese and English, which were extracted automatically from our parallel corpus. These sentence pairs can be used to train and evaluate machine translation systems. Our test collection also includes search topics for cross-lingual patent retrieval, which can be used to evaluate the contribution of machine translation to retrieving patent documents across languages. This paper describes our test collection, methods for evaluating machine translation, and preliminary experiments.
منابع مشابه
The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملExtrinsic Evaluation of Patent MT: Review and Commentary
There has been a long history of work on the application of Machine Translation (MT) to support cross-language information access for patent collections. Much of this work has leveraged fairly traditional information retrieval evaluation designs as a basis for extrinsic (i.e., task-based) evaluation, but other evaluation designs are also possible. This survey reviews the work to date on extrins...
متن کاملExploiting Patent Information for the Evaluation of Machine Translation
We have produced a test collection for machine translation (MT). Our test collection includes approximately 2 000 000 sentence pairs in Japanese and English, which were extracted from patent documents and can be used to train and evaluate MT systems. Our test collection also includes search topics for crosslingual information retrieval, to evaluate the contribution of MT to retrieving patent do...
متن کاملOverview of the Patent Translation Task at the NTCIR-7 Workshop
To aid research and development in machine translation, we have produced a test collection for Japanese/English machine translation and performed the Patent Translation Task at the Seventh NTCIR Workshop. To obtain a parallel corpus, we extracted patent documents for the same or related inventions published in Japan and the United States. Our test collection includes approximately 2 000 000 sen...
متن کاملISTIC Statistical Machine Translation System for Patent machine translation in NTCIR-9
This paper describes statistical machine translation system of ISTIC used in the evaluation campaign of the patent machine translation task at NTCIR-9. In this year's evaluation, we participated in patent machine translation task for ChineseEnglish. Here we mainly describe the overview of the system, the primary modules, the key techniques and the evaluation results.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008